International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering An ISO 3297: 2007 Certified Organization Vol. 5, Special Issue 4, March 2016 National Conference on Signal Processing, Instrumentation and Communication Engineering (SPICE' 16) Organized by Dept. of ECE, Mar Baselios Institute of Technology & Science (MBITS), Kothamangalam, Kerala-686693, India # Implementation of an APC-OMS Based LUT Multiplier for DSP Applications Liya Paul<sup>1</sup>, Reen Paul<sup>2</sup> PG Scholar, Dept. of ECE, College of Engineering, Munnar, Kerala, India<sup>1</sup> Assistant Professor, Dept. of ECE, College of Engineering, Munnar, Kerala, India<sup>2</sup> ABSTRACT: Computing in memory or processing in memory had made popular which attempted to reduce the Von-Neumann bottleneck. In memory-based computing, instead of actual computations the computational functions are done by lookup-tables (LUTs). It includes multiplication with fixed set of coefficients. They have the properties like simpler and regular as compared with multiply accumulate structures. LUT optimization for memory-based multiplication can be done with three memory based computation techniques like, antisymmetric product coding (APC), odd-multiple-storage (OMS) and APC-OMS. This paper compares the area, power and delay trade off between APC, OMS, and APC-OMS LUT based multiplier. The proposed memory based computing techniques APC, OMS, APC-OMS LUT based multiplier are coded in Verilog HDL and synthesized in Xilinx ISE Design suite 14.2 and Cadence 90nm library. And the proposed novel APC-OMS technique can optimize the memory usage and also we can save the area, power and delay through this method which is very advantageous to the DSP applications. KEYWORDS: Cadence, Digital Signal Processing (DSP), lookup- table (LUT), memory based computing, Verilog. #### I. INTRODUCTION The fixed coefficient multiplication plays a vital role in digital signal processing (DSP) applications. Memory based computing is applicable for most of the digital signal processing (DSP) algorithms, that uses multiplication with a fixed set of coefficients. It is a class of dedicated systems, and the computational functions are performed by look-up tables (LUTs), instead of actual calculation [1] [2]. Memory based computing is more effective which partition the input into smaller partitions and represent these individual partitions using LUTs. The unique feature of a memory based hardware is that, fabrics for computation and data storage are same, i.e. memory. Hence the delay and energy associated with data into and out from the memory is completely eliminated [3]. LUT optimization for memory-based multiplication can be done with memory based computation techniques namely, antisymmetric product coding (APC), odd-multiple-storage (OMS) and APC-OMS. In odd-multiple-storage (OMS) only the odd multiples of the fixed coefficient are needed to be store in LUT [4] [5], while in antisymmetric product coding (APC) approach, the product words are recorded as antisymmetric pairs [6]. By these two techniques it results in the reduction of the LUT size by a factor of two. And in the APC-OMS combined approach it shows a reduction in LUT size to one-fourth of the conventional LUT. Since the input address and LUT output are always mapped into odd integers and the two's complement operations would be very much simplified in the combined approach of APC-OMS technique. ### II. MEMORY BASED COMPUTATION TECHNIQUES ### $1. \ Antisymmetric \ product \ coding \ (APC) \ technique \ for \ LUT \ Optimization$ The product words for different values of X for word length L=5 is shown in Table I. In this table, we can see that the input word X in the first column of each row is the two's complement as that of the third column of the same row. And also the sum of product values corresponding to these two input values in the same row is 32A [7]. Here we are considering the X (input) and A (fixed coefficient) be the positive integers. Let the product values in the second column of the row be u and fourth columns of a row be v. From the table, we can see that (u+v)=32A, and u and v can be found out by the following equation $u = \frac{[\mathbf{u} + \mathbf{v}]}{2} + \frac{[\mathbf{v} - \mathbf{u}]}{2} \text{ and } \mathbf{v} = \frac{[\mathbf{u} + \mathbf{v}]}{2} - \frac{[\mathbf{v} - \mathbf{u}]}{2}$ $\tag{1}$ #### International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering An ISO 3297: 2007 Certified Organization Vol. 5, Special Issue 4, March 2016 #### National Conference on Signal Processing, Instrumentation and Communication Engineering (SPICE' 16) #### Organized by Dept. of ECE, Mar Baselios Institute of Technology & Science (MBITS), Kothamangalam, Kerala-686693, India Substituting (u + v) = 32A, then we can rewrite the Eq.1 by $$u = 16A + \frac{[v-u]}{2}$$ and $v = 16A - \frac{[v-u]}{2}$ (2) In Equation 2 a negative symmetry can be seen on u and v. So we can decrease the LUT size to half by only storing the [v-u]/2 for a pair of input on the same row due to this behavior. And for X=(00000), the APC word stored already as 16A.Since the product is derived from the antisymmetric behavior of the products, hence the technique is named as an antisymmetric product code. The 4 bit address X'=(x3', x2', x1', x0') can be accessed by the following relation in such a way that X'=XL for X=X and X=X' for X=X for X=X where X=X is the four bit LSB of X and the two's complement of X is X=X. The product word is obtained by the following equation Product word = $$(16A + (sign value) (APC word))$$ (3) If mean significant bit =1 then the sign value will be 1 and if mean significant bit =0 then the sign value will be -1. For the addition and subtraction operation we are using adder circuit and the operations are done between 16A and LUT output. 2. Odd multiple storage (OMS) technique for LUT optimization The product words for different values of input X for L=5 for an OMS based LUT multiplier are shown in Table 2. In OMS technique only( $2^L/2$ )words are stored in the LUT i.e. only the odd multiples of the fixed coefficient (A) are stored, and all the even multiples of A would be derived from the left-shift operations of one of those odd multiples [5]. The OMS word 2A is already stored in LUT for the input X= (00000). 3. APC-OMS technique for LUT optimization The product words for different values of input X for L = 5 of an APC-OMS based LUT multiplier are shown in Table 3. Here only nine memory locations are required to store the product values including for the input X = (00000). In Table II it can be noticed that, only eight memory locations are used to store the eight odd multiples. By the left-shift operations on fixed coefficient A even multiples 2A, 4A, and 8A can be derived. Table 1: APC Word for Different Input Values for L=5 | Input, X | Product<br>values (u) | Input, X | Product<br>values (v) | Address<br>x3'x2'x1'x0' | APC words | |----------|-----------------------|----------|-----------------------|-------------------------|------------| | 00001 | A | 11111 | 31A | 1111 | 15A | | 00010 | 2A | 11110 | 30A | 1110 | 14A | | 00011 | 3A | 11101 | 29A | 1101 | 13A | | 00100 | 4A | 11100 | 28A | 1100 | 12A | | 00101 | 5A | 11011 | 27A | 1011 | 11A | | 00110 | 6A | 11010 | 26A | 1010 | 10A | | 00111 | 7A | 11001 | 25A | 1001 | 9A | | 01000 | 8A | 11000 | 24A | 1000 | 8A | | 01001 | 9A | 10111 | 23A | 0111 | 7 <b>A</b> | | 01010 | 10A | 10110 | 22A | 0110 | 6A | | 01011 | 11A | 10101 | 21A | 0101 | 5A | | 01100 | 12A | 10100 | 20A | 0100 | 4A | | 01101 | 13A | 10011 | 19A | 0011 | 3A | | 01110 | 14A | 10010 | 18A | 0010 | 2A | | 01111 | 15A | 10001 | 17A | 0001 | A | | 10000 | 16A | 10000 | 16A | 0000 | 0 | #### International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering An ISO 3297: 2007 Certified Organization Vol. 5, Special Issue 4, March 2016 # National Conference on Signal Processing, Instrumentation and Communication Engineering (SPICE' 16) Organized by Dept. of ECE, Mar Baselios Institute of Technology & Science (MBITS), Kothamangalam, Kerala-686693, India Table 2: OMS Words for Different Input Values for L=5 | Input(X) | Product | No. of | Shifted | Address | Stored value | |----------|---------|--------|---------|---------|--------------| | | value | shifts | input | | | | 00001 | A | 0 | | | | | 00010 | 2A | 1 | | | | | 00100 | 4A | 2 | 00001 | 00000 | A | | 01000 | 8A | 3 | | | | | 10000 | 16A | 4 | | | | | 00011 | 3A | 0 | 00011 | 00001 | 3A | | 00110 | 6A | 1 | | | | | 01100 | 12A | 2 | | | I | | 11000 | 24A | 3 | | | 1 | | 00101 | 5A | 0 | 00101 | 00010 | 5A | | 01010 | 10A | 1 | | | | | 10100 | 20A | 2 | | | | | 00111 | 7A | 0 | 00111 | 00011 | 7A | | 01110 | 14A | 1 | | | | | 11100 | 28A | 2 | | | | | 01001 | 9A | 0 | 01001 | 00100 | 9A | | 10010 | 18A | 1 | | | | | 01011 | 11A | 0 | 01011 | 00101 | 11A | | 10110 | 22A | 1 | | | | | 01101 | 13A | 0 | 01101 | 00110 | 13A | | 11010 | 26A | 1 | | | | | 01111 | 15A | 0 | 01111 | 00111 | 15A | | 11110 | 30A | 1 | | | | | 10001 | 17A | 0 | 10001 | 01000 | 17A | | 10011 | 19A | 0 | 10011 | 01001 | 19A | | 10101 | 21A | 0 | 10101 | 01010 | 21A | | 10111 | 23A | 0 | 10111 | 01011 | 23A | | 11001 | 25A | 0 | 11001 | 01100 | 25A | | 11011 | 27A | 0 | 11010 | 01101 | 27A | | 11101 | 29A | 0 | 11101 | 01110 | 29A | | 11111 | 31A | 0 | 11111 | 01111 | 31A | | 00000 | 0 | 4 | | 10000 | 2A | Table 3: APC-OMS Words For Different Input Values For L=5 | Input,<br>X | Input,<br>X | Input,<br>(X') | Product<br>value | No: of<br>shifts | Shifted<br>input | Address | Stored value | |-------------|-------------|----------------|------------------|------------------|------------------|---------|--------------| | 00001 | 11111 | 0001 | A | 0 | | | | | 00010 | 11110 | 0010 | 2A | 1 | 0004 | | | | 00011 | 11101 | 0100 | 4A | 2 | 0001 | 0000 | A | | 00100 | 11100 | 1000 | 8A | 3 | | | | | 00101 | 11011 | 0011 | 3A | 0 | 0011 | 0001 | 3A | | 00110 | 11010 | 0110 | 6A | 1 | | | | | 00111 | 11001 | 1100 | 12A | 2 | | | | | 01000 | 11000 | 0101 | 5A | 0 | 0101 | 0010 | 5 <b>A</b> | | 01001 | 10111 | 1010 | 10A | 1 | | | | | 01010 | 10110 | 0111 | 7A | 0 | 0111 | 0011 | 7A | | 01011 | 10101 | 1110 | 14A | 1 | | | | | 01100 | 10100 | 1001 | 9A | 0 | 1001 | 0100 | 9A | | 01101 | 10011 | 1011 | 11A | 0 | 1011 | 0101 | 11A | | 01110 | 10010 | 1101 | 13A | 0 | 1101 | 0110 | 13A | | 01111 | 10001 | 1111 | 15A | 0 | 1111 | 0111 | 15A | | | | 10000 | 16A | | | | | | | | 00000 | 0 | 3 | | 1000 | 2A | | | | | | | | | | #### International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering An ISO 3297: 2007 Certified Organization Vol. 5, Special Issue 4, March 2016 National Conference on Signal Processing, Instrumentation and Communication Engineering (SPICE' 16) Organized by Dept. of ECE, Mar Baselios Institute of Technology & Science (MBITS), Kothamangalam, Kerala-686693, India #### III. IMPLEMENTATION OF MEMORY BASED COMPUTATION TECHNIQUES #### 1.APC based LUT multiplier Figure 1 shows the block diagram of APC based LUT multiplier for L=5.It consists of a four-input LUT containing 16 words for to store the APC values of product words as in the sixth column of Table I. In addition, it consists of an address mapping circuit and an add/subtract block. The address circuit will provide the desired address (x3'x2'x1'x0') according to (2). Then the LUT output is derived according to the address and then the LUT output is added to or subtracted from 16A according to the MSB .When the MSB = 1, the LUT output is added with 16A and if x4=0, the LUT output is subtracted from 16A. Fig. 1: APC based LUT multiplier for L=5 #### 2. OMS based LUT multiplier Figure 2 shows the block diagram of OMS based LUT multiplier for L=5,which consists of a four-input LUT containing 32 words to store the APC values of product words as in the sixth column of Table II. In addition, it have an address-mapping circuit which will map the five bit input to the five bit address, a shifter to produce the actual product value and the control circuit is to generate the control bits like s0, s1, s2 for producing the desired number of shifts for the shifter. The control signals are generated by the following expression. $$S0 = (x1(\sim x0)) + ((\sim x0) X3(\sim x2)$$ (4) $$S1 = ((\sim x0) X3(\sim x1)) + ((\sim x0) x2(\sim x1))$$ (5) $$S2 = ((\sim x0)) ((\sim x1)) ((\sim x2))$$ (6) #### 3. APC-OMS based LUT multiplier Figure 3 shows the block diagram of APC-OMS based LUT multiplier for L=5. It consists of a LUT of nine words with 4-bit width, a shifter, an address generating circuit, and a control circuit. The address generation circuit will map the 5-bit input in to 4-bit address word (d3d2d1d0). The control circuit is used to generate the control bits s0, s1 for to producing the number of shifts for the shifter. The output is obtained in such a way that if MSB=0, the output will be the same product word from the shifter and if MSB=1, the product is obtained by addition16A and the shifted value. The control signals are generated by the following expression $$S0 = (\sim (x0 + (\sim (x1 + (\sim x2))))$$ (7) $$S1 = \sim (x0 + x1) \tag{8}$$ #### International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering An ISO 3297: 2007 Certified Organization Vol. 5, Special Issue 4, March 2016 ## National Conference on Signal Processing, Instrumentation and Communication Engineering (SPICE' 16) Organized by Dept. of ECE, Mar Baselios Institute of Technology & Science (MBITS), Kothamangalam, Kerala-686693, India Fig. 2: OMS based LUT multiplier for L=5 Fig. 3: APC-OMS based LUT multiplier for L=5 #### IV. SIMULATION RESULTS The APC, OMS, APC-OMS based LUT based multiplier are coded in Verilog HDL and synthesized in Xilinx ISE design suite 14.2 and Cadence 90nm Technology. Figure 4, 5, 6 shows the output waveforms of the three techniques that are obtained from the Xilinx Design Suite 14.2. and the Table 4 shows the comparison of three techniques on the factors area and speed that are obtained from Cadence 90nm digital flow. Fig.4: Simulation result of APC based LUT multiplier for L=5 #### International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering An ISO 3297: 2007 Certified Organization Vol. 5, Special Issue 4, March 2016 National Conference on Signal Processing, Instrumentation and Communication Engineering (SPICE' 16) Organized by Dept. of ECE, Mar Baselios Institute of Technology & Science (MBITS), Kothamangalam, Kerala-686693, India Fig. 5: Simulation result of OMS based LUT multiplier for L=5 Fig. 6: Simulation result of APC- OMS based LUT multiplier for L=5 Table 4: Comparison of APC, OMS, APC-OMS Based LUT Multiplier | Parameters | APC | OMS | APC-OMS | | |------------|-----------|-----------|-----------|--| | Area(µm²) | 894.701 | 956.09 | 767.693 | | | Power(nW) | 48522.701 | 57196.883 | 45792.883 | | | Delay(ps) | 1207.9 | 747.92 | 616.3 | | #### V. CONCLUSION The combined approach of modified APC-OMS LUT based multiplier shows savings on area, power and speed as compared with other techniques. For the APC-OMS based LUT multiplier it offers more than 14.19%, 5.6% and 48.9% respectively in a savings of area, power and delay over the APC based LUT based multiplier. When APC-OMS based LUT multiplier compared with OMS based LUT multiplier, it offers more than 19.70%, 19.93% and 17.59% of respectively in savings of area, power and delay. So the APC-OMS based LUT multiplier can be used to implement for DSP applications like in interpolator designs. Further work can also be done in signed numbers. Further work can also be done in digital signal processing applications which can increase the speed of operation and decrease the hardware complexity. #### REFERENCES - [1] R R.Schaller," Technological innovation in the semiconductor industry: a case study of the international technology roadmap for semiconductors (itrs)," Ph.D. dissertation, George Mason University, 2004. - [2] P. K. Meher, "Memory-based hardware for resource-constraint digital signal processing system," in Information, Communications & Signal Processing, 2007 6th International Conference on .IEEE, 2007, pp.1–4. - [3] P.K. Meher, "Lut optimization for memory-based computation," Circuits and Systems II: Express Briefs, IEEE Transactions on,vol. 57, no. 4, pp. 285–289, 2010. - [4] P. K. Meher, "New approach to LUT implementation and accumulation for memory-based multiplication," in *Proc. IEEE ISCAS*, May 2009, pp. 453–456. #### International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering An ISO 3297: 2007 Certified Organization Vol. 5, Special Issue 4, March 2016 ## National Conference on Signal Processing, Instrumentation and Communication Engineering (SPICE' 16) Organized by Dept. of ECE, Mar Baselios Institute of Technology & Science (MBITS), Kothamangalam, Kerala-686693, India - [5] P. K. Meher, "New approach to look-up-table design and memory-based realization of fir digital filter," Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 57, no. 3, pp. 592–603, 2010. - [6] P. K. Meher, "New look-up-table optimizations for memory-based multiplication," in Integrated Circuits, ISIC'09. Proceedings of the 2009 12<sup>th</sup> International Symposium on. IEEE, 2009, pp. 663–666.P. K. Visscher, "How Self-Organization Evolves," Nature, vol. 421, pp. 799–800 Feb.2003. - [7] P. K. Meher, "Lut-based circuits for future wireless systems," in Circuits and Systems (MWSCAS), 2010 53rd IEEE International Midwest Symposium on. IEEE, 2010, pp. 696–699.